GCMM: graph convolution network based on multimodal attention mechanism for drug repurposing

Background The main focus of in silico drug repurposing, which is a promising area for using artificial intelligence in drug discovery, is the prediction of drug–disease relationships. Although many computational models have been proposed recently, it is still difficult to reliably predict drug–disease associations from a variety of sources of data. Results In order to identify potential drug–disease associations, this paper introduces a novel end-to-end model called Graph convolution network based on a multimodal attention mechanism (GCMM). In particular, GCMM incorporates known drug–disease relations, drug–drug chemical similarity, drug–drug therapeutic similarity, disease–disease semantic similarity, and disease–disease target-based similarity into a heterogeneous network. A Graph Convolution Network encoder is used to learn how diseases and drugs are embedded in various perspectives. Additionally, GCMM can enhance performance by applying a multimodal attention layer to assign various levels of value to various features and the inputting of multi-source information. Conclusion 5 fold cross-validation evaluations show that the GCMM outperforms four recently proposed deep-learning models on the majority of the criteria. It shows that GCMM can predict drug–disease relationships reliably and suggests improvement in the desired metrics. Hyper-parameter analysis and exploratory ablation experiments are also provided to demonstrate the necessity of each module of the model and the highest possible level of prediction performance. Additionally, a case study on Alzheimer’s disease (AD). Four of the five medications indicated by GCMM to have the highest potential correlation coefficient with AD have been demonstrated through literature or experimental research, demonstrating the viability of GCMM. All of these results imply that GCMM can provide a strong and effective tool for drug development and repositioning.


Introduction
The creation of new drugs still takes a long time, despite technological advancements and rising investment in this area [1]. The minimal number of brand-new medications that have been authorized for sale in recent years cannot meet the healthcare needs of the modern world [2]. DR research has emerged as a potential area in drug discovery and is attracting more interest [3] in order to increase the effectiveness and dependability of medications. There are numerous examples of drug repurposing that have been effective in finding new uses for already-approved medications. The pharmaceutical business can use two methodologies, known as in-silico DR and activity-based DR, respectively [4,5]. Activity-based DR is frequently experimental and time-consuming [6]. A significant amount of biological data is being generated for the expertise repositioning process at a lower cost thanks to the quick development of biomedical technologies, such as high-throughput screening [7] and next-generation sequencing technology [8]. Since the repositioning medicine has successfully completed three stages of clinical testing and prior information may be questioned [9], computational DR is far less expensive and more accessible than experimental techniques [10].
Feature-matching-based and molecular docking techniques are two examples of traditional computational DR methods [11]. It has become increasingly and successfully possible to predict the links between drugs and diseases and between drugs and proteins thanks to the development of artificial intelligence technology [12]. As a result, algorithms have been developed that can anticipate how certain drugs would interact with certain diseases or other organisms, and their performance is steadily getting better. The similarity-based algorithm is based on the idea of guilt by association [13], which is the fundamental idea in the field of DR. According to the theory of guilt by association, the likelihood that two drugs will be associated with the same disorders is increased in direct proportion to how functionally similar they are [14].
Prior research in DR has mostly concentrated on machine learning algorithms. Laplacian regularized least square (LapRLS), a semi-supervised learning technique used to predict drug-protein interactions, was proposed by Xia et al. [15]. Bayesian ANalysis to Determine Drug Interaction Targets (BANDIT4F), created by Madhukar et al. [16], enables precise prediction of drug interactions with particular targets, including identifying particular targets for a wide range of small molecules and various modes of action on the same target. However, the majority of machine learning methods largely rely on feature engineering and expert knowledge. As an extension of the artificial neural network, deep learning [17] is also widely used in computational drug repurposing. The advantage of deep learning is that it can learn the complex relationship between input features and output decisions from large-scale data. To learn drug feature representation, Zeng et al. [18] constructed multiple drug-related networks and integrated them with a multi-modality autoencoder named DeepDR. Then, by feeding the known drug features and drug-disease correlations into the variable differential autoencoder's pre-training, the prospective drug-disease associations are anticipated. When DeepDR's results are evaluated using cross-validation and case studies, they outperform traditional methods in identifying novel drug-disease connections. The relationships between drugs and diseases can be thought of as a bipartite graph, which can be thought of as a heterogeneous biological network made up of relationships between drugs, diseases, and drug targets.
As a result, the graph embedding approach, particularly the graph neural network method [19], is gradually applied to this issue. In order to anticipate probable drug-target interactions, Wan et al. [20] developed a neural integration of neighbor information from an HN (NeoDTI). NeoDTI automatically learns topology-preserving representations while integrating a variety of data from HN. In order to aggregate the embeddings from several graph convolution layers using an attention mechanism, Yu et al. [21] suggested a layer attention graph convolutional network (LAGCN) for the prediction of drug-disease associations. Li et al. [36] established the NIMGCN, which applies GCN to the networks for miRNA similarity and disease similarity, respectively, and adds a neural inductive matrix completion to predict the relationships between miRNA and diseases.
Although computational DR performance for existing techniques has been remarkable, there are still several limitations. Some strategies initially simply take into account comparable drug information while ignoring the relationship between diseases. Additionally, contrary to reality, most models treat the relevance of multimodal information related to disease and drugs as being equal. This paper suggests GCMM to predict potential drug-disease connections using multi-source data in order to overcome all of the aforementioned problems. First, HN are derived from multi-view drug and disease-related information, and the GCN encoder produces drug and disease embeddings based on multi-source similarity. Then, rather than being connected directly, the features are weighted according to the global average pooling of multisource information attention process. The next stage is a fully connection layer for futher feature learning. Finally, matrix completion is used to determine the drug-disease correlation coefficient for each pair, treating the issue as a recommendation task from an HN. A comparative experiment is also run using four recently proposed deep learning-based models to confirm the validity of the suggested model. It demonstrates that the GCMM outperforms other models in this HN. A case study done on predicting potential treatments for AD further demonstrates the GCMM's improvement and applicability.
Overall, the main contribution of this paper can be summaried as follows: • According to study, muti-source of drug and disease information to construst HN is better to extract and fuse information for in silico DR from open-source databases. • A novel end-to-end GCMM is proposed that can accuratly predict potential relationships and improve performance than four baseline networks. Specifically, analysis of results provides the proof of accuracy and robustness of GCMM. • Case study conducted on AD indicates GCMM's availability. Futhermore, 80% of the five drugs with the highest correlation coefficient are supported by previous research and the therapeutic potential of Methicillin on AD is further analyzed.

Materials and methods
In this paper, the problem of drug-disease prediction is treated as the recommendation task from a HN with drugs, diseases as nodes, and interactions or relationships as edges. As shown in Fig. 1, this section describes the HN constructed from multi-source information, consisting of four kinds of drug-drug, disease-disease similarity, and the experimentally validated drug-disease associations. After that, the workflow of the proposed framework GCMM to predict drug-disease association is illustrated. Figure 1a shows the process of building a HN. HN includes the known drug-disease associations, drug-drug chemical similarity G C , drug-drug therapeutic similarity G T , disease-disease semantic similarity G M and disease-disease target-based similarity G A .

The known drug-disease associations
Clinically reported or experimentally verified drug-disease associations from two comprehensive databases are integrated to establish the HN: DrugBank [22] and repoDB [23]. The network includes 5159 experimentally verified drug-disease pairs between 1519 drugs and 728 diseases. The drugs and diseases are normalized through standard terms from Medical Subject Headings (MeSH) [24].

Drug-drug chemical similarity
By using Open Babel v2.3.1 [25], Molecular Access System (MACCS) fingerprints [26] can be computed via the SMILES string for the drugs [27]. If two drug molecules g i , g j have a and b bits set in their MACCS fragment bit-strings, with c of these bits being set in the fingerprints of both drugs, the chemical similarity [28] G C (gi,gj) of the drug-drug pair is defined as: G C ∈ R N g ×N g represents the chemical view of the drug, which N g indicates the number of drug.

Drug-drug therapeutic similarity
Drug therapeutic similarity is measured by the the canonical protein sequences similarity of drug targets, which contains the probability of a therapeutic linkage between drugs. The canonical protein sequences in Homo sapiens is downloaded from Uniprot database (http:// www. unipr ot. org/). Then the protein sequence similarity T (e 1 , e 2 ) of two drug targets e 1 and e 2 using the Smith-Waterman algorithm [29]. The Smith-Waterman algorithm performs local sequence alignment by comparing segments of all possible lengths and optimizing the similarity measure for determining similar regions between two strings of protein canonical sequences of drug targets. The overall sequence similarity of the drug targets binding two drugs g i and g j is determined by Eq. 2 by averaging all pairs of proteins e 1 and e 2 with e 1 ∈ E 1 and e 2 ∈ E 2 under the condition e 1 = e 2 .
Matrix G T ∈ R N g ×N g can be considered as the therapeutic view of the drug.

Disease-disease semantic similarity
The National Institute of Health (NIH) database (http:// www. ncbi. nlm. nih. gov/) is available for researching the relationship between different diseases. As described in [30], each MeSH representing a disease showed a structure of a hierarchical Directed Acyclic Graph (DAG). For a disease s i , its hierarchical relationship represented by DAG( is the set of nodes containing s i and its ancestors, and ε(s i ) denotes the set of direct links from parent nodes to their child nodes. Following previous work [30], diseases that share larger part of their DAGs tend to have higher semantic similarity. The contribution of a node n in DAG(s i ) to the semantic value of disease s i is given by: The semantic value of disease s i is defined as: The semantic similarity of two diseases G M (si,sj) is defined as: DV (s i ) and DV s j represents the sematic contribution of disease s i and disease s j respectively. Then, the matrix G M ∈ R N s ×N s symbolizes the sematic view of the disease. N s is the number of diseases.

Disease-disease target-based similarity
Disease target-based similarity measure is measured by using the known drug-disease associations, which contains the probability of a target linkage between diseases. Jaccard similarity algorithm [31] is used to calculate the similarity of nodal structure. E i and E j represents target sets that are related to disease S i and S j respectively, the target-based similarity G A (si,sj) of the disease-disease pair is defined as: Similarly, matrix G A ∈ R N s ×N s notes the target-beased view of the disease.

Model architecture
Based on the HN structure constructed in the previous part, a novel end-to-end graph neural network framework GCMM is proposed to identify the potential drug-disease associations. The model is mainly composed of an encoder and a decoder. To be more specific, as shown in Fig. 1b-f, GCMM consists of the four main modules detailed below: 2-layers multi-view GCN encoder, multimodal based attention mechanism, fully connected feature extractor, and matrix complete decoder.

Multi-view GCN encoder
Convolutional nerual network(CNN) [32] has been widely used in many fields, such as computer vision, speech recognition, and natural language processing. However, CNN can not be applied to data structures in non-Euclidean space. GCN [33] is a typical spectral model that combines graph convolution and neural networks to achieve the graph task of semi-supervised classification. In particular, GCN uses the Laplacian matrix of a graph to derive its Laplacian operator in the frequency domain, then analogies the convolution in the Euclidean space in the frequency domain to derive the formula of graph convolution. On an application level, GCN and its variants significantly improve many network-related predictive tasks, such as predicting the properties and structure of small biological molecules. In GCMM, a multi-view GCN encoder on four similarity networks is used to learn drug and disease low-dimensional representations. As Fig. 2 shows, the GCN encoder updates the features by integrating the domain information of nodes in the graph. The where X (l+1) ∈ R N g ×F g denotes the F g dimension features of N g drugs in (l + 1) th GCN layer. In particular, X (0) is randomly initialized and W (l) is the parameter matrix of model learning. A denotes the adjacent matrix for similarity G and the formula is defined as: is the symmetric normalized Laplacian matrix of G and D is a diagonal matrix with diagonal entry D ij = j A ij . Analogously, disease nodes feature acquired by similarty graph G M and G A as follows: Using a multi-layer GCN encoder to the multiple similarity graphs, drug and disease embeddings from different views X C , X T , Y M , Y A can be obtained.

Multimodal based attention mechanism
Attention mechanism [34] is inspired by the biological system of human that focus on the distinctive parts when processing large amount of information. The model will be more expressive and can hold more data the more parameters it has, but this also introduces the issue of information overload. The issue of information overload can be resolved, and the effectiveness and accuracy of task processing can be enhanced, by introducing attention mechanisms to focus on the information that is more important to the current task, reduce attention to other information, and filter out irrelevant information. Attention has gradually become one of the most important concepts in the deep learning field.
In GCMM, the multimodal-based attention layer is introduced after the multi-view features are obtained. As shown in Fig. 3, it enables the model the ability to distinguish and assigns different weights for multi-source input. Global average pooling is used to calculate the weight of each embedding. For drug with F g in channels, in this article F g in = 2 , its channel statistic Z g ∈ R 1×1×F g in is calculated by drug's features X ∈ R F g ×N g ×F g in . For the chemical feature of drug X C , the channel statistic z c is defined as: And the attention weights of all channels can be computed as: where δ(·) and σ (·) represents Sigmoid activation function and Relu activation function, respectively. W 1 , W 2 are the training parameters. Multimodal attention Z att is composed of Z att = z att c , z att t . Finally, feature of each view and its corresponding weight coefficient are combined to standardize, for drug in the chemical view and therapeutic view with attention is shown in the 12 and 13 : In the same way, drug and disease attention-based normalized embeddings from different views X C , X T , Y M , Y A can be obtained through this module. Drug channel embedding is identified as X = X C , X T , disease channel embedding is identified as

Fully connected feature extractor
The fully connected layer is skilled in synthesizing information extracted from the previous section. In this module, it is utilized to integrate multiple view information and generate final embedding. Given drug channel embedding X = X C , X T , the final feature X ′ ∈ R F g out ×N g is defined as: where W X ∈ R V g ×1 is the learning parameter, and Lin X ∈ R 1×N g means the output of drug embedding. The final feature of drug X ′ is computed from stacking the multiple channel outputs. Analogously, disease final embedding Y ′ can be obtained.

Matrix completion decoder
The learned drug and disease embeddings from the encoder are input into the matrix completion module, and the preference prediction problem is treated as a recommendation task. The predicted association matrix U ∈ R N g ×N s is defined as: for the values in U, U ij is the degree to which drug i is associated with disease j. The goal of GCMM is to minimize the Frebious norm of the difference between U and experimentally verified label matrix U ′ . The loss function of the model is defined as follows:

Experiment settings
Known drug-disease association pairs are taken as the positive samples and other pairs as negative instances. Due to the low density of the dataset, 5FCCV is used to evaluate the prediction performance on all positive samples and randomly selected negative instances of the same size. In each round, one subset serves as the valid set and the others as the training set. All experiments are conducted on a single GTX 2080Ti GPU with 11GB of memory on a Linux system. Adam optimization algorithm [35] is used to minimize the loss value druing the model's training, and 1000 training epochs with the 0.001 learning rate. The area under the receiver operating characteristic (ROC) curve (AUC) and the area under the precision/recall (PR) curve (AUPR) are chosen as the primary evaluation index of robustness. Besides, the threshold-based metrics are also calculated, i.e., Recall (also known as sensitivity), Accuracy(ACC), Precision and F1-measure (F1). The metrics can be calculated by:

Performance of GCMM on the cross-validation
As shown in Table 1, it is the average of ten experiments. According to the results, it can be observed that GCMM accurately predicts the association between drug and disease and perform robustly in the dataset. The average AUC score is about 0.90, and the average AUPR score is about 0.91. In addition, the deviations of each fold are low, which demonstrates the stability of the model.

Baseline methods and performance comparison
Four recently proposed deep-learning models, including DeepDR, NeoDTI, LAGCN, and NIMGCN [18,20,21,36], are chosen as baseline approaches in order to demonstrate the superiority of GCMM's performance. They are also similarity-based graph neural network models. The training and testing sets of all comparison models are the same as those of GCMM. Training was carried out according to the degree of fit of each model, and the hyperparameters of these models are tuned. First, the same training dataset as GCMM and the ratio of 1:1 positive and negative samples are used to compare these models. The average results of their ten trials are shown in Table 2. Besides, the ROC curve and PR curve are drawn for prediction performance evaluation. As shown in Fig. 4a, ROC curve represents how the true positive rate (TPR) and false positive rate (FPR) change under different thresholds, the model with better classification performance has a larger AUC. As shown in Fig. 4b, The PR curve represents the precision and recall rate changes at different thresholds. The larger the AUPR value is, the better the effect of the model will be. Next, perform a cross-validation test on all pairs, both positive and negative. This scenario basically mimicked the practical situation in which the drug-disease pairs are sparsely labeled. It can be observed that GCMM greatly outperformed other baseline methods, with significant improvement on most indicators from Table 3.
It can be observed that the GCMM model is more optimized than the other models on two primary indexes. Futhermore, other metrics stabilized by GCMM are relatively more stable compared with other methods. The priority of GCMM can be attributed to the following points: • Graph convolution network has a good effect on feature extraction from similarity graph and fusing heterogeneous information. • The multi-dimensional attention mechanism is introduced to process multimodal information, especially for the complex drug-disease network. • The full connection layer can further extract the feature effectively.

Model ablation experiment
Two GCMM variations are used in the ablation experiment in this section in order to verify the significance of each module in the GCMM.
To determine if a multimodal-based attention layer increases the model's predictive performance, GCMM without an attention layer (GCMM sans att) is used. The attention   mechanism enhances the performance of the GCMM by roughly 3%, as seen in Table 4 and Fig. 5. For GCMM and GCMM without linear layer (GCMM_no_lin), Fig. 5 shows the linear layer futher extract the embedding could improve the metrics by about 4%. The model obtains a high level of prediction accuracy due to the combination of each of its modules.

The ablation of multi-source information
To verify the importance of multi-modal information, the ablation experiments of single and multiple multi-source information are tested. As shown in Table 5, it is the experimental result of all multi-source information combations. The information of G C + G T + G M + G A obviously superior to the results of other combations on most metrics. To be specific, its result 3.0% more than the best single information G C + G M on AUC and AUPR. Futhermore, its result 2.3% more than the G T + G M + G A on AUC and AUPR.

Hyper-parameter Analysis
Four important parameters-the number of GCN layers, embedding size, output channels, and learning rate-are examined through experimentation to examine the impact of hyper-parameters on model performance.
• It can be obsevered in Fig. 6a that the 3 layers has the lowest performance, it can be attributed to the limitation of GNNs is the over-smoothing issue [37]. And the result of 1 GCN layer suggests that a shallow GCN can not sufficiently propagate the node feature to fuse heterogeneous information. Meanwhile, it can found that GCMM achieved significant improvement with the appropriate 2 GCN layers. • The embedding size can directly affect the performance of the GCMM. In the experiment, embedding size is changed in [32,64,128,256,512] dimensions. From Fig. 6b,  Fig. 6c that AUC and AUPR achieve the highest with 128 output channels in GCMM. • Learning rate is the degree to which each parameter is optimized as loss function during model training, and its value is related to whether the model can be optimal result. If the learning rate is too high, the parameters to be optimized will fluctuate near the minimum value. On the contrary, too small learning rate will lead to slow convergence of parameters to be optimized. Figure 6d shows the optimum learning rate for the model is 0.001.

New drugs predicted for AD
To further assess the quality of GCMM's novel prediction, a case study is undertaken using a literature-based evaluation of new drug-disease pairs. Specifically, GCMM is applied to predict candidate drugs for AD. AD is now the most common neurodegenerative disease [38], general dementia is characteristic and the etiology is unknown. The application of drug retargeting as a predictive treatment for AD is of great value. After calculating the predicted correlations of all drug-disease pairs, a sorted list of top5 drug-disease associations is generated based on the predicted scores. New associations are then obtained by excluding all known drug-disease associations from the dataset. Table 6 shows top5 predicted candidate drugs for AD, and four of them (80%) have literature-reported evidence. Specifically, Dexamethasone ( (11β, 16α)-9-Fluoro-11 ) has the highest predictive correlation coefficient with AD. Dexamethasone levels proved to be an important consideration in AD from [39] and [40] indicates that the combination of acyclovir and Dexamethasone might be an alternative therapy for the treatment of AD. The second is Cysteamine, which is the small molecules the decarboxylated derivative of the amino acid cysteine and a desirable characteristic of drugs targeting neurodegeneration. In [41], Chronic cysteamine treatment resulted in improvements in habituation and spatial learning deficits in the APP-Psen1 mouse model of AD. Thirdly, Aripiprazole is a novel antipsychotic molecule. [42] first compares the efficacy, safety of Aripiprazole with placebo in patients with psychosis associated with AD. [43] futher conducted double-blind experiment for the treatment of psychosis in nursing home patients with AD. [44] finally describes randomized controlled trials evaluating the use of aripiprazole in AD-related psychosis and proved its therapeutic effect. In addition, the fourth molecule Rifapentine (RIF) is an antibiotic used to treat tuberculosis, but prevents curli-dependent adhesion and biofilm formation in E. coli at concentrations below those that affect viability [45]. [46] reports the first direct quantification of RIF from rat brain homogenate, simultaneously studies the clearance of amyloid-β and finds that RIF crosses the blood-brain barrier and has a protective effect on AD, and further in vivo studies are under investigation.

Properties analysis of Meticillin
Since there is no correlation between Meticillin and AD in literature and experimental demonstration, this section analyzes the properties of Meticillin and its similarity to new predictive drugs. Rifapentine [45,46] Methicillin is a penicillin-resistant penicillin, and its antibacterial action is similar to penicillin [47]. Its molecular formula is C 17 H 20 N 2 O 6 S and chemical structure is shown in Fig. 7. Methicillin is mainly used at resistant penicillin staphylococcus caused by all kinds of infection, such as sepsis, respiratory tract infection, meningitis, soft tissue infection, also can be used at pyogenic streptococcus or pneumococcus and resistant penicillin staphylococcus caused by mixed infection [47].

Conclusion
Drug-disease potential relationships prediction is an important research field of computational drug repurposing to improve drug utilization and guide clinical application. This paper establishes a novel model called GCMM for identifying the potential drugdisease associations. First, GCMM fuses topological information about the similarities of multiple drugs and diseases through the HN by GCN encoders. Second, in contrast to existing methods that assign the same weight to each source, the multimodal attention mechanism is applied to integrate multi-source information. After the full connected layer, the correlation coefficients of each pair of drug-disease are obtained through a matrix completion decoder. Experimental results in 5FCCV demonstrate that GCMM performs better than the other four similarity-based graph neural network models, DeepDR, NeoDTI, LAGCN, and NIMGCN [18,20,21,36], in the majority of indexs, and has a much higher accuracy. In addition, a case study on AD's potential therapeutic provides specific applications that reaffirms the medical validity of GCMM. All of these results imply the effectiveness and robustness of GCMM and supported by the finding the novel predicted drug-disease associations for drug repurposing. In future research, it is a worthwhile area to examine how to increase the dependability and diversity of biological information with the low sparsity of biological data. Morever, additional biological components, including as proteins, miRNAs, and biological processes, that are implicated in the medication treatment of diseases can be added to the HN.